Active Learning with Distributional Estimates
نویسندگان
چکیده
Active Learning (AL) is increasingly important in a broad range of applications. Two main AL principles to obtain accurate classification with few labeled data are refinement of the current decision boundary and exploration of poorly sampled regions. In this paper we derive a novel AL scheme that balances these two principles in a natural way. In contrast to many AL strategies, which are based on an estimated class conditional probability p̂(y|x), a key component of our approach is to view this quantity as a random variable, hence explicitly considering the uncertainty in its estimated value. Our main contribution is a novel mathematical framework for uncertainty-based AL, and a corresponding AL scheme, where the uncertainty in p̂(y|x) is modeled by a second-order distribution. On the practical side, we show how to approximate such second-order distributions for kernel density classification. Finally, we find that over a large number of UCI, USPS and Caltech-4 datasets, our AL scheme achieves significantly better learning curves than popular AL methods such as uncertainty sampling and error reduction sampling, when all use the same kernel density classifier.
منابع مشابه
An Adaptive Strategy for Active Learning with Smooth Decision Boundary
We present the first adaptive strategy for active learning in the setting of classification with smooth decision boundary. The problem of adaptivity (to unknown distributional parameters) has remained opened since the seminal work of Castro and Nowak (2007), which first established (active learning) rates for this setting. While some recent advances on this problem establish adaptive rates in t...
متن کاملHeuristics in exploration: Distributional information is selectively used for active learning
Everyday decision-making is filled with choices about what to act on, with outcomes playing a critical role in learning. Information gain is oft cited as a valuable approach to maximize potential learning, but its computation is costly. It entails evaluating the probability of multiple outcomes given any possible action, and then considering the degree of belief-change over all possibilities. G...
متن کاملDistributional Term Set Expansion
This paper is a short empirical study of the performance of centrality and classification based iterative term set expansion methods for distributional semantic models. Iterative term set expansion is an interactive process using distributional semantics models where a user labels terms as belonging to some sought after term set, and a system uses this labeling to supply the user with new, cand...
متن کاملImproving English Named Entity Recognition
In recent years much of the work in named entity recognition has been focused on tackling entities in different languages or domains. However the task of English named entity recognition still remains to be solved. In this paper we explore more ways to improve the English named entity recognition system beyond just distributional semantics and the use of external gazetteer. These ways include p...
متن کاملFast phonetic learning occurs already in 2-to-3-month old infants: an ERP study
An important mechanism for learning speech sounds in the first year of life is "distributional learning," i.e., learning by simply listening to the frequency distributions of the speech sounds in the environment. In the lab, fast distributional learning has been reported for infants in the second half of the first year; the present study examined whether it can also be demonstrated at a much yo...
متن کامل